A checkpointing-recovery scheme for domino free distributed systems
نویسندگان
چکیده
Many communication induced checkpointing algorithms have been proposed for asynchronous cooperating processes. All of them suffer from overhead due both to the exchange of control information and to the insertion of local checkpoints additional to the basic ones. In this paper we propose a low overhead checkpointing-recovery scheme. It consists of a domino-free checkpointing algorithm plus an asynchronous recovery scheme. The main characteristic of the proposed checkpointing-recovery scheme is the minimization of the number of the overall local checkpoints. A simulation study is also presented which quantifies such a reduction.
منابع مشابه
Efficient Checkpoint-based Failure Recovery Techniques in Mobile Computing Systems
Conventional distributed and domino effect-free failure recovery techniques are inappropriate for mobile computing systems because each mobile host is forced to take a new checkpoint (based on coordinated checkpointing). Otherwise, multiple local checkpoints may need to be stored in stable storage (based on communication-induced checkpointing). Hence, this investigation presents a novel domino ...
متن کاملEfficient Techniques for Adaptive Independent Checkpointing in Distributed Systems
This work presents two novel algorithms to prevent rollback propagation for independent checkpointing: an efficient adaptive independent checkpointing algorithm and an optimized adaptive independent checkpointing algorithm. The last opportunity strategy that yields a better performance than the conservation strategy is also employed to prevent useless checkpoints for both causal rewinding paths...
متن کاملCharacterization of Consistent Global Checkpoints in Large-Scale Distributed Systems
Backward error recovery is one of the most used schemes to ensure fault-tolerance in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation in an error-free global state from which it can be resumed to produce a correct behaviour. Checkpointing is one of the techniques to pursue the backward error recovery. As we consider large-scale distribut...
متن کاملDistributed Recovery Units: An Approach for Hybrid and Adaptive Distributed Recovery
Traditionally, distributed recovery schemes have been designed for systems consisting of multiple recovery units. Each recovery unit (RU) resides on a single processor and it can fail and recover as a whole. This report introduces the \distributed recovery unit (DRU)" abstraction as an approach for design of \hybrid" and \adaptive" recovery schemes for distributed systems. The distributed syste...
متن کاملAdaptive Communication-Induced Checkpointing Protocols with Domino-Effect Freedom
The domino effect is an important problem for the checkpointing and rollback recovery in distributed systems. Communication-induced checkpointing is one way of preventing domino effect. Most existing such protocols focus on guaranteeing that every checkpoint is part of a consistent global checkpoint. This may induce high run-time overhead due to the possibly excessive number of extra forced che...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000